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ABSTRACT 



A system and method for content-based search and retrieval 
of visual objects. A base visual information retrieval (VIR) 
engine utilizes a set of universal primitives to operate on the 
visual objects. An extensible VIR engine allows custom, 
modular primitives to be defined and registered. A custom 
primitive addresses domain specific problems and can utilize 
any image understanding technique. Object attributes can be 
extracted over the entire image or over only a portion of the 
object. A s chema is defined as a specific collection . of 
p rimitives. A specific sc hema implies a specific set of visual 
features to b e processed and a corresponding feature ve ctor 
bfi l1fiflfT Tnr ^ qn tent- based similarity scorin g. A primitive 
registration interface registers custom primitives and facili- 
tates storing of an analysis function and a comparison 
; function to a schema table. A heterogeneous comparison 
allows objects analyzed by different schemas to be com- 
pared if at least one primitive is in common between the 
schemas. A threshold-based comparison is utilized to 
improve performance of the VIR engine. A distance between 
two feature vectors is computed in any of the comparison 
processes so as to generate a similarity score. 

10 Claims, 16 Drawing Sheets 
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THRESHOLD-BASED COMPARISON 

RELATED APPLICATIONS 

This application claims the benefit of the filing date of 
U.S. patent application Ser. No. 60/014,893, fi led Mar , 2Q r 
I QQfi, f nr "SIMILARITY ENGINE FOR CONTENT- 
BASED RETRIEVAL OF OBJECTS", to Jain, et al. 

MATERIAL SUBJECT TO COPYRIGHT 
PROTECTION 

A portion of the disclosure of this patent document 
contains material which is subject to copyright protection. 
The copyright owner has no objection to the facsimile 
reproduction by anyone of the patent document or the patent 
disclosure, as it appears in the Patent and Trademark Office 
patent file or records, but otherwise reserves all copyright 
rights whatsoever. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to visual information 
retrieval systems. More specifically, the invention is directed 
to an extensible system for retrieval of stored visual objects 
based on similarity of content to a target visual object. 

2. Description of the Related Technology 

One of the most important technologies needed across 
many traditional and emerging applications is the manage- 
ment of visual information. Every day we are bombarded 
with information presented in the form of images. So 
important are images in our world of information 
technology, that we generate literally millions of images 
every day, and this number keeps escalating with advances 
in imaging, visualization, video, and computing technolo- 
gies. 

It would be impossible to cope with this explosion of 
image information, unless the images were organized for 
rapid retrieval on demand. A similar situation occurred in the 
past for numeric and other structured data, and led to the 
creation of computerized database management systems. In 
these systems, large amounts of data are organized into fields 
and import ant nr key fiel ds are used t o index the_ j afaha^* 
making se arch very efficient . These information manage - 
ment systems have changed several aspects of the modern 
society. These systems, however, are limited by the fact that 
they work well only with numeric data and short alpha- 
numeric strings. Since so much information is in non- 
alphanumeric form (such as images, video, speech), to deal 
with such information, researchers started exploring the 
design and implementat ion of vi ciaq ] flfitah ac '' g But creation 
of mere image repositories is of little value unless there are 
methods for fast retrieval of objects such as images based on 
their content, ideally with an efficiency that we find in 
today's databases. O ne should be ab le to s earch visua l 
data bases with visual-based q ueries, in aaa mon to alphanu - 
meric qus aes. The tundamentai proDiem is tnai images, 
video and other similar data differ from numeric data and 
text in format, and hence thev req uire a totally diffe rent 
techniqu^ofor ganization. indexing, and qnp . ry proc essing. 
One needs to consider the issues in visual information 
management, rather than simply extending the existing 
database technology to deal with images. One must treat 
images as one of the central sources of information rather 
than as an appendix to the main database. 

A few researchers have addressed problems in visual 
databases. Most of these efforts in visual databases, 
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however, focussed either on only a small aspect of the 
problem, such as data structures or pictorial queries, or on a 
very narrow application, such as databases for pottery 
articles of a particular tribe. Other researchers have devel- 
5 oped image processing shells which use several images. 
Clearly, visual information management systems encompass 
not only databases, but aspects of image processing and 
image understanding, very sophisticated interfaces, 
knowledge-based systems, compression and decompression 
1Q of images. Moreover, memory management and organiza- 
tion issues start becoming much more serious than in the 
largest alphanumeric databases. 

A significant event in the world of information systems in 
the past few years is the development of multimedia infor- 
15 mation systems. A multimedia information system goes 
beyond traditional database systems to incorporate various 
modes of non-textual digital data, such as digitized images 
and videos, in addition to textual information. It allows a 
user the same (or better) ease of use and flexibility of storage 
2Q and access as traditional database systems. Today, thanks to 
an ever-increasing number of application areas like stock 
photography, medical imaging, digital video production, 
document imaging and so forth, gigabytes of image and 
video information are being produced every day. The need 
25 to handle this information has resulted in new technological 
requirements and challenges: 

Image and video data are much more voluminous than 
text, and need supporting technology for rapid and 
efficient storage and retrieval. 
30 There are several different modes in which a user would 
search for, view, and use images and videos. 
Even if multimedia information resides on different com- 
puters or locations, it should easily be available to the 
user. 

35 Thus, representation, storage, retrieval, visualization and 
distribution of multimedia information is now a central 
theme both in the academic community and industry alike. 
What is needed is a capability to manage this information. 
In traditional da^frase systems, users search images bv 

40 keywords or descriptions asso c i a ted with the visual infor=_ 
mation. In a traditional database management system 
(DBMS), an image is treated as a file name, or the raw image 
data exists as a_binary large object (BLOB). The limitation 
is clear: a file name or the raw image data is useful for 

45 displaying the image, but not for describing it. In some 
applications, these shortcomings were overcome by having 
a person participate in the process by interpreting and 
assigning keyword descriptions to images. However, textual 
descriptors such as a set of keywords are also inadequate to 

50 describe an image, simply because the same image might be 
described in different ways by different people. What is 
needed is a new multimedia information system technology 
model such as a visual information management system 
(VTMSYS) model. Unlike traditional database systems, this 

55 model recognizes that most users pre fer to search jmage an d 
video information by what the jma^e or vidpn a re ally 
contains, rather than by ke ywords or descriptions associated. 

^ -witfr Tfie~visual inform aTion. -The 'only proper method by 
which the user can get access to the content of the image is 

60 by using image-analysis technology to extract the content 
from an image or video. Once extracted, the content repre- 
sents most of what the user needs in order to organize, 
search, and locate necessary visual information. 

This breakthrough concept of content extraction alleviates 

65 several technological problems. The foremost benefit is that 
it gives a user the power to retrieve visual information by 
asking a query like "Give me all pictures that look like this." 
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The system satisfies the query by comparing the content of 
the query picture with that of all target pictures in the 
database. This is called Query By Pictorial Example 
(QBPE), and is a simple form of content-based retrieval, a 
new paradigm in database management systems. 5 

Over the last five years research and development in 
content-based retrieval of visual information has made sig- 
nificant progress. Academic research groups have developed 
techniques by which images and videos can be searched 
based on their color, texture, shape and motion characteris- 10 
tics. Commercial systems supporting this technology, such 
as Ultimedia Manager from IBM, and the Visual Intelligence 
Blade from Illustra Information Technologies, Inc. are 
beginning to emerge. 

A typical content -based retrieval system might be 15 
described as follows: image features are precomputed dur jng 
an image insertion phas e. These representations may include 
characteristics such as local intensity histograms, edge 
histograms, region-based moments, spectral characteristics, 
and so forth. These features are then stored in a database as 20 
structured data. A typical query involves finding the images 
which are "visually similar" to a given candidate image. In 
order to submit a query, a user presents (or constructs) a 
candidate image. This query image may already have fea- 
tures associated with it (i.e., an image which already exists 25 
within the database), or may be novel, in which case a 
characterization is performed "on the fly" to generate fea- 
tures. Once the query image has been characterized, the 
query executes by comparing the features of the candidate 
image against those of other images in the database. The 30 
result of each comparison is a scalar score which indicates 
the degree of similarity. This score is then used to rank order 
the results of the query. This process can be extremely fast 
because image features are pre-computed during the inser- 
tion phase, and distance functions have been designed to be 35 
extremely efficient at query time. There are many variants on 
this general scheme, such as allowing t he user to expre ss 
queries directly at the feature level, cornbiningjmages to 
forrrTquenes r qtiervuiE oveTTegrons of irite_E£ st^and-6Q 4brth. 

GeneraTsystems (using color, shape, etc.) are adequate for 40 
applications with a broad image domain, such as generic 
stock photography. In general, however, these systems are 
not applicable to specific, constrained domains. It is not 
expected, for example, that a texture similarity measure that 
works well for nature photography will work equally well 45 
for mammography. If mammogram databases need to be 
searched by image content, one would need to develop 
specific features and similarity measures. This implies that a 
viable content-based image retrieval system will have to 
provide a mechanism to define arbitrary image domains and 50 
allow a user to query on a user-defined schema of image 
features and similarity metrics. 

There is a need to prov ide a way to compare image s 
re presented by different schem as. lnere is also a need to 
reduce the time performing the comparison, especially when 55 
large numbers of images are in the database. 

SUMMARY OF THE INVENTION 

The above needs are satisfied by the present invention 
which is directed to a system and method for "content- 60 
based" image retrieval, a technique which explicitly man- 
ages image assets by direcdy representing their visual 
attributes. A visual information retrieval (VIR) Engine pro- 
vides an open framework for building such a system. ^A 
visual featu re is any property of an image that can be 65 
corrftrOTed using comp uter-vision and . image -processin g 
t e'LluiltjUes. Examples are nue, satura tion, and intensity 
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histograms; text ure me as ures such as edge density, 
randomnessjeriodicity, and_orientat ion; shape me asures 
such-flff'aigebraic moments, turning angle histograms, and 
e longatedne ss. Some of these features are computed 
globally, i.e., over an entire image, and some are local, i.e. } 
computed over a small region in the image. The VIR Engine 
expresses visual features as image " primitives"^ Primitives 
can be very general (sucJus-GplOT, ^n^nrJejOiirc) or quite 
domain specific (face recognition, cancer cell detection, 
etc.). The basic philosophy underlying this architecture is a 
transformation from the data -rich representation of explicit 
image pixels to a compact, semantic-rich representation of 
visually salient characteristics. In practice, the design of 
such primitives is non-trivial, and is driven by a number of 
conflicting real- world constraints (e.g., computation time vs. 
accuracy). The VIR Engine provides an open framework for 
developers to "plug-in" primitives to solve specific image 
management problems. 

Various types of visual queries are supported by the VIR 
Engine as follows: 

Query Jjy^imagg property, wherein a user specifies a 
property or attribute of the image, such as the arrange- 
ment of colors, or they may sketch an object and 
request the system to find images that contain similar 
properties. The Engine also allows the user to specify 
whether or not the location of the property in the image 
(e.g., blue at t he bottom of the image .ar-blue anywhere) 
is significarfrT \ 
Query by. image . similarity, wherein a user provides an 
entire image as a query target and the system finds 
images that are visually similar. 
Query re finement o r systematic browsing. With any of the 
previous modes of query, the system produces some 
initial results. A browsing query is one that refines the 
query by either choosing an image from the previous 
result set, or by modifying the parameters of the 
original query in some way. The system in this situation 
reuses the previous results to generate refined results. 
An important concept in content-based retrieval is to 
determine how similar two pictures are to one another. The 
notion of simflarit y^(vers us^exact matching as in database 
systems ) is ap propriat e for visual informat ion because m ul- 
tiple pictures of the same scene wilTnol n ecessarily "m atch," 
although the)Tareldeniical in content. In the paradigm of 
content-based retrieval, pictures are no t simply matc hed, bu t 
ar e ranked in order of their siimiMapivHto the query image. 
Another benefit is that content extraction results in very high 
information compression. The content of an im age file m ay 
be expressed in as little as several hundred bytes of memory , 
regardless of the original image size. As an image is inserted 
into a VIMSYS database, the system extracts the content in 
terms of generic image properties such as' j ts colo r T texture, 
sbap£_and comp osition, and uses this information for all 
subsequent database operations. Except for display, the 
original image is not accessed. Naturally, the VIMSYS 
model also supports textual attributes as do all standard 
databases. 

The VIR technology improves query success in many 
applications where images are collected, stored, retrieved, 
compared, distributed, or sold. Some applications for VIR 
technology include: managing digital images by stock photo 
agencies, photographers, ad agencies, publishers, libraries, 
and museums; managing digital video images for production 
houses and stock-footage providers; visually screening or 
comparing digital images in medicine and health care; 
searching files of facial images for law enforcement, credit 
card, or banking applications; satellite imaging; manufac- 
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turing test and inspection; manufacturing defect classifica- 
tion; and browsing an electronic catalog for on-line shop- 
ping. 

In one aspect of the invention, there is a method of visual 
object comparison for a database of visual objects, compris- 
ing the steps of: a) applying pri mitives to a first visual obj ect 
to ext ract a first feature vector , each primit ive providing .at 
least one primitive value to th e firs t fea ture vec tor, b) 
a pplying^rim iUve s-to"T"second visual object to extrac t a 
second f eature^ vector, each prim itive providing-at ieast one 
primitive value to the second fea ture ve ctor: c) providinj^an 
ord ejajfTyalue lor each primitive to order the primitive s; d) 
colffpanng one of the primitive values from the first feature 
vector with the corresponding primitive value of the second 
feature vector according to the ordering so as to obtain a 
primitive score; e) apply ing a primitive wei ghLtoihe primi- 
tive score_to d etermine a wc-'p h ^^ n nrpilive score; f) 
summing the we ighted p rimitive -score into a summed total 
score; and gj repeating steps d-f until the summe d total 
score cfgSBgsX fi d rxtH thnvih" td 

BRIEF DESCRIPTION OF THE DRAWINGS 

Hie present invention will be described in further detail 
with reference to the accompanying drawings, in which: 

FIG. 1A is a block diagram of the modules of one 
embodiment of a visual information retrieval (VIR) system. 

FIG. IB is a block diagram of a hardware configuration 
for the VIR system of FIG. 1A. 

FIG. 2 is an exemplary screen display seen while execut- 
ing the query canvas module 108 shown in FIG. 1A. 

FIG. 3 is an exemplary screen display sees during execu- 
tion of the alphanumeric query input module 106, or sub- 
sequent to execution of the query canvas module 108 or 
image browsing module 110 shown in FIG. 1A. 

FIG. 4 is an exemplary screen display seen while execut- 
ing the thumbnail results browser 136 shown in FIG. 1A. 

FIGS. 5 A and 5B are a high-level flow diagram showing 
the operation of the VIR system shown in FIG. 1A which 
includes the Base VIR Engine. 

FIG. 6 is a block diagram showing the components of the 
Extensible VIR Engine. 

FIG. 7 is a block diagram of an exemplary VIR system 
utilizing the Extensible VIR Engine of FIG. 6. 

FIG. 8 is a high level flowchart of the operation of the 
Extensible VIR Engine shown in FIG. 6. 

FIG. 9 is a flow diagram of portions of another embodi- 
ment of a VIR system utilizing the Extensible VIR Engine 
of FIG. 6. 

FIG. 10 is a flowchart of the run analyzer function 366 
shown in FIG. 8. 

FIG. 11 is a flowchart of the standard comparison function 
396 shown in FIG. 9. 

FIG. 12 is a flowchart of the threshold comparison func- 
tion 398 shown in FIG. 9. 

FIG. 13 is a flowchart of a schema creation and primitive 
registration function which is performed, in part, by the 
primitive registration interface 306 shown in FIG. 6. 

FIG. 14 is a flowchart of a top "N" query function 
performed by either the Base VIR Engine of FIG. lAor the 
Extensible VIR Engine shown in FIG. 6. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

The following detailed description of the preferred 
embodiment presents a description of certain specific 
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embodiments of the present invention. However, the present 
invention can be embodied in a multitude of different ways 
as defined and covered by the claims. In this description, 
reference is made to the drawings wherein like parts are 

5 designated with like numerals throughout. 

For convenience, the discussion of the preferred embodi- 
ment will be organized into the folio wing principal sections: 
Introduction and Model, Base VIR Engine and System, 
Extensible VIR Engine and System, Applications, and 

10 Application Development. 

I. INTRODUCTION AND MODEL 

The VIR Engine is a library-based tool kit that is delivered 
in binary form (an object library with header file interfaces) 

15 on various platforms, and provides an American National 
Standards Institute (ANSI) "C" language interface to the 
application developer. It provides access to the technology 
of Visual Information Retrieval (VIR), which allows images 
to be mathematically characterized and compared to one 

20 another on the basis of "visual similarity". Applications may 
now search for images or rank them based on "what they 
look like 5 '. The VIR Engine looks at the pixel data in the 
images, and analyzes the data with respect to visual 
attributes such as color, texture, shape, and structure. These 

25 visual attributes are called "primitives" , and the image 
characterization is built up from these. Images which have 
been analyzed may then be compared mathematically to 
determine their similarity value or "score" . Images are 
analyzed once, and the primitive data is then used for fast 

30 comparisons. 

A first embodiment of the invention provides a "Base VIR 
Engine API" which has a fixed set of visual primitives, and 
the necessary calls for analyzing and comparing images. A 

35 second embodiment of the invention provides an "Exten- 
sible VIR Engine API" which allows application developers 
the ability to create new visual primitives for specialized, 
vertical applications. This enables application developers to 
capture higher level semantic information about the images 

^ being analyzed, and create intelligent applications in specific 
domains. 

The main functions of the Base Engine application pro- 
gramming interface (API) are: initialization and global 
definitions, image analysis functions, similarity comparison 

45 functions, scoring functions, and weights management. In 
addition to the functionality of the Base Engine, the Exten- 
sible Engine API also has primitive registration and schema 
management. The entry points for these functions are 
defined in regular "C" header files. 

50 The VIR Engine has a "stateless" architecture in which all 
of the data about images is managed and stored by the 
application. Applications are responsible for passing "raw" 
intake data fe.p.. red, green, blue (RGB) format buffers ) into 
the engine, and then handling the featur e data and scoring 

55 information that is returned to the applicatioo by the E ngine. 
When a comparison is desired,~the application passes the 
feature data for a pair of images back to the Engine to obtain 
a final score. Thus, all persistent data management, query set 
management, and similar activities, are the responsibility of 

60 the application developer. The Engine makes no assump- 
tions about storage methodologies, formats, list 
management, or any information structures that require state 
information. 

Similarity scoring is a comparison of images based on a 
65 conceptual "feature space", where each image is a "point" in 
this space. The similarity score is a number that represents 
the abstract distance between two given images in this space. 
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Each visual primitive provides a component of the overall computed over smaller regions of the image. For each 
similarity score; that is, each primitive provides its own generic image property such as color, texture, aod shape, a 
multi-dimensional feature space. An overall visual similarity number of primitives may be computed. Besides this con- 
score is provided by combining the primitive scores in a way ceptual definition of a primitive, the specific implementation 
that is visually meaningful This is both application and user 5 may also be referred to as a primitive. For instance, the 
dependent; therefore the Engine allows the application to collection of functions to extract and compare an image 
pass in a set of we ightings that define th e^ "impor tance" of — attribute may be referred to as a primitive. 
eacE pri mitive in computing trie overall scory , In the pTr.s- Distance Metrics 

e'ntty preferred embodiment, the scores are normalized in the Since primitives are extracted by different computational 

range [0 . . . 100]. 10 processes, they belong to different topological spaces, each 

The Virage Model of Visual Information having different distance metrics defined for them. 

Following the aforementioned VIMSYS data model for Computationally, these metrics are designed to be robust to 

visual information, Virage technology admits four layers of small perturbations in the input data. Because the abstracted 

information abstraction: the raw image (the Image Repre- image primitives are defined in topological spaces, search - 

sentation Layer), the processed image (the Image Object 15 ing for similarity in any image property corresponds to 

Layer), the user's features of interest (called the Domain finding a (partial) rank order of distances between a query 

Object Layer) and the user's events of interest for videos or primitive and other primitives in that same space. Also, since 

other collections of sequenced images (the Domain Event the space of image properties is essentially 

Layer). The top three layers form the content of the image multidimensional, several different primitives are necessary 

or video. A discussion of representing the abstracted infer- 20 to express the content of an image. This implies that 

mation by data types follows. The data types pertain to the individual distance metrics need to be combined into a 

top three layers of the model. co mposite metric using a method of weighted contribut ions. 

Data Types Primitive Weig hting 

A content -based information retrieval system creates an The overall similarity between two images lies literally 

abstraction of the raw information in the form of features, 25 "in the eye of the beholder." In other words, the perceptual 

and then operates only at the level of the abstracted infer- distance between images is not computable in terms of 

mation. In general, data types and representation issues are topological metrics. The same user will also change his or 

only constrained by the language used for an implementa- her interpretation of similarity depending on the task at 

tion. hand. To express this subjective element, the VIR interface 

One presently preferred implementation is as follows. For 30 provides functions to allow the user to control which relative 

visual information, features may belong to five abstract data combinations of individual distances satisfies bis or her 

types: values, distributions, indexed values, indexed needs. As the user changes the relative importance of 

distributions, and graphs. A value is, in the general case, a set primitives by adjusting a set of weighting factors (at query 

of vectors that may represent some global property of the time), the VIR system incorpor ates the we igh t vahi esJntn 

image. The global color of an image, for example, can be a 35 the similarity computation hetween-fealiire vectors, 

vector of RGB values, while the dominant colors of an The information model described above is central to the 

image can be defined as the set of k most frequent RGB system architecture. All other aspects such as the keywords 

vectors in an image. A distribution, such as a color histogram associated with images, the exact nature of data management 

is typically defined on an n-dimensional space which has and so forth are somewhat secondary and depend on the 

been partitioned into b buckets. Thus, it is a b -dimensional 40 application environments in which the technology is used, 

vector. An indexed value is a value local to a region of an The software aspects of this core technology are explained 

image or a time point in a video or both; as a data type it is hereinbelow. An explanation of the different environments 

an indexed set of vectors. The index can be one -dimensional in which the core model is embedded also follows, 

as in the key-frame number for a video, or it can be r> AC? r? \nr> ckt^tktc antt, C vctckx 

multi-dimensional as in the orthonormal bounding box coor- 45 IL ™ E ^ ENGINE AND SYSTEM 

dinates covering an image segment. An indexed distribution The VIR system technology is built around a core module 

is a local pattern such as the intensity profile of a region of called the VIR Engine and operates at the Image Object 

interest, and can be derived from a collection of Level of the VIMSYS model. There are three main func- 

b-dimensional vectors by introducing an index. A graph tional parts of the Engine: Image Analysis, Image 

represents relational information, such as the relative spatial 50 Comparison, and Management. These are invoked by an 

position of two regions of interest in an image. We do not application developer. Typically, an application developer 

consider a graph as a primary type of interest, because it can accesses them during image insertion, image query, and 

be implemented in terms of the other four data types, with image requery (a query with the same image but with a 

some application-dependent rules of interpretation (e.g. different set of weighting factors). The function of each unit, 

transitivity of spatial predicates, such as left-of). 55 and how the application developer uses the VIR Application 

It follows from the foregoing discussion that vectors form Programming Interface (API) to exchange information with 

a uniform base type for features representing image content. the VIR Engine is described below. The full capabilities of 

In a presently preferred embodiment, the primary data type the Engine are decomposed into two API sets: a Base VIR 

in the VIR Engine is a (indexable) collection of feature Engine, and an Extensible VIR Engine. The Base Engine 

vectors (FVs). 60 provides a fixed set of primitives (color, texture, structure, 

Primitives etc.) while the Extensible Engine provides a set of mecha- 

Image objects have computable image properties or nisms for defining and installing new primitives (discussed 

attributes that can be localized in the spatial domain in detail later), 

(arrangement of color), the frequency domain (sharp edge Base System Modules 

fragments), or by statistical methods (random texture). 65 Referring to FIG. 1A, the modules of an embodiment of 

These computed features are called primitives. Primitives a visual information retrieval (VIR) system 100 that utilizes 

are either global, computed over an entire image, or local, the Base VIR Engine 120 will be described. A user 102 
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communicates with the system 100 by use of computer 
input/output 104. The computer I/O 104 will be further 
described in conjunction with FIG.1B. The user 102 initiates 
one of several modules or functions 106-114 that output to 
either the VI R Engine 120 or a database engine 130. The 
database engine 130 can be one of the many commercially 
available database engines available on the market, such as 
available from Informix Software, Inc., or IBM DB2. 

An "Alpha-numeric query input" module 106 allows the 
user to specify a target object by alpha -numeric attributes, 
such as shown in an exemplary Query Window screen of 
FIG. 3. The output of this module bypasses the VIR Engine 
120 and is used as a direct input to the database engine 130. 

A "Query Canvas" module 108 provides a visual query 
input to the VIR Engine 120. The Query Canvas module 108 
will be further described in conjunction with FIG. 2. 

An "Image Browsing" module 110 provides a visual 
input, such as an image from a file or database accessible to 
the user 102. The file or database may be on the user's 
computer, such as on a hard drive, CD-ROM, digital video/ 
versatile disk (DVD) drive, tape cartridge, ZIP media, or 
other backup media, or accessible through a network, such 
as a local area network (LAN), a wide area network (WAN) 
or the Internet. The visual input is provided to the VIR 
Engine 120. An "Insertion" module 112 is used to provide 
one or more new images to be added to a database 132 
accessible by the database engine 130. The new image(s) are 
provided as inputs to the VIR Engine 120. Note that refer- 
ences to the database 132 may be to a portion or a partition 
of the entire database, such as, for example visual objects 
associated with a particular domain. Therefore, visual 
objects for multiple domains or subsets of a domain could be 
stored in separate databases or they may be stored in one 
database. 

An "Other Database Management" module 114 is used to 
initiate standard database operations on database 132. Mod- 
ule 114 communicates directly with the database engine 130. 

The VIR Engine 120 comprises two main modules: an 
"Image Analysis" module 122 and an "Image Comparison" 
module 124. The image analysis module 122 receives inputs 
from either module 108 or 110 to generate a query target or 
from the insertion module 112 for adding a new image into 
the database 132. The output of the image analysis module 
122 is a feature vector (FV) that describes the visual object 
passed to it by one of modules 108, 110 or 112. The FV is 
passed on to the database engine 130. In addition, if module 
112 was used to insert the image into the database, both the 
FV for the image and the image itself are stored in the 
database 132 (as seen in FIG. 5B). The image analysis 
module 122 will be described in greater detail hereinbelow. 

The image comparison module 124 receives a query 
target FV and a FV for the image being tested or compared 
from the database engine 130. The output of the image 
comparison module 124 is a similarity score that is sent to 
a "Ranked List Management" module 134. A plurality of 
images from the database 132 are compared one at a time to 
the query image by the image comparison module 124. The 
resultant similarity scores are accumulated by the module 
134 so as to provide a rank in an order of their similarity to 
the query image. The ranked results of the list management 
module 134 are provided to a "Thumbnail Results Browser" 
136 for display to the user 102 through the computer I/O 
104. An exemplary screen display of ranked results is shown 
in FIG. 4. 

Referring now to FIG. IB, a hardware configuration for 
the VIR system of FIG. 1A will be described. A computer or 
workstation 140a communicates with a server 160 by a 
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network 162, such as a local area network (LAN) or wide 
area network (WAN). One or more additional computers or 
workstations 140b can be connected to the server 160 by the 
network 162. The computers 140a and 1406 can be a 

5 personal computer, such as utilizing an Intel microprocessor 
chip (at minimum, a 80486 model) or a Motorola PowerPC 
chip, or a workstation utilizing a DEC Alpha chip, a SPARC 
chip, a MIPS chip, or other similar processor 144. A com- 
puter enclosure 142 contains the processor 144, a storage 

10 device 146 connected to the processor 134 preferably of at 
least 1-2 Gigabytes, and a memory of at least 32 Megabytes 
(not shown). Connected to the processor 144 are a plurality 
of I/O devices 104 (FIG. 1A) including a visual monitor 148, 
a printer 150, a pointing device (such as a mouse, trackball 

15 or joystick) 152, and a keyboard 154. Optional I/O devices 
include a scanner 154 and a backup unit 158, The server 160 
typically has similar or greater processing power than the 
computers 140a and 1406 but typically has a larger capacity 
storage device and memory. The server 160 also has a 

20 backup facility to safeguard the programs and data. The 
server 160 may be connected to remote computers similar to 
computer 140a by a modem 164 to another network 166, 
which may be a WAN or the Internet for example. 
The present invention is not limited to a particular com- 

25 puter configuration. The hardware configuration described 
above is one of many possible configurations. Other types of 
computers, server and networks may be utilized. 

In one embodiment of the system 100, the modules shown 
in FIG. 1A may all be physically located on one computer 

30 140a. In another embodiment of system 100, the computer 
I/O 104, and modules 106-114 and 134-136 could be 
located on computer 140a, while the VIR Engine 120, the 
database engine 130 and the database store 132 could all be 
located on the server 160. In yet another embodiment of 

35 system 100 that is similar to the previous embodiment, the 
VIR Engine 120 could be on server 160 and the database 
engine 130 and the database store 132 could be located on 
another server (not shown) on the network 160. Other 
combinations of the above modules are also possible in yet 

40 other embodiments of the system 100. Furthermore, indi- 
vidual modules may be partitioned across computing 
devices. 
Query Canvas 

Referring to FIG. 2, an exemplary screen display 180 of 

45 the Query Canvas module 108 will be described. The Query 
Canvas is a specific user-interface mechanism that is an 
enhancement to the query specification environment. The 
Query Canvas provides a bitmap editor to express the query 
visually, and serves as an input to the Image Analysis 

50 module 122 (FIG. 1A). The canvas may begin as a blank 
slate in a canvas window 181, or may have an existing image 
pre-loaded into it (drag and drop an image from an existing 
image collection) prior to modification with a set of 
painting/drawing tools. These tools include, for example, 

55 standard brushes 184, pens, region fills, a magic wand to 
define regions, ovals 186, rectangles, lines, and so forth. A 
color palette 188 is provided, with the ability to define new 
colors from a color chooser. A palette of textures 190 is also 
provided, with the ability to select new textures from a large 

60 library. 

Once an image, such as image 182, has been created, it 
can be submitted as a query to the system. The Query 
Canvas tool saves the user significant initial browsing time 
in those cases where he or she already has an idea of what 
65 the target images should look like. Since the query canvas 
allows modification of images, it encompasses the function- 
ality of the "query-by-sketch" paradigm. 
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Of course, one will recognize that the present invention is Global Color (252): considers both the dominant color 

not limited to any particular type of query creation. and the variation of color throughout the entire image. 

Query Window Structure (254): determines large scale structure in the 

Referring to FIG. 3, an exemplary screen display 200 of as represented mainly by edges with strong 

/I Query Window will be described. The Query Window or 5 matching for the location and orientation of edge 

form 200 is provided to specify alphanumeric information features 

201 such asjc^rds, 5^fil?JUDBJIia»fe P"jj«* « Texture (256): analyzes areas for periodicity, randomness, 

client names, an d so fort h The Query Window 200 also ^ (smoothness) of fine-grained textures in 
shows'anTconic Tillage 202 or me current contents of the 

Query Canvas 108 (FIG. 1A) which expresses the visual 10 T "^Scs- 

component of the query „,,,.„ w . , ^turning^now to the analysis module 122, the analysis 

,nn ° W T r ' , 6 mo ; tu " por, ? n . t Query Window ^ ^ operations, such as 

200 are the sliders (such as slider 208) to control fiejejative smoothi £ ^ contrast enhancement, to make the image 

u^tls^^T!^^^ and textual aspects ^ fo * rimitiv6 . cxtraction routincs . Each 

of thejjuerv. There are sliders to indicate the importuce of 15 pril Jti V6 . cxtractioQ routin6 takes a preprocessed image, 

visually at tributes such as Colo^Texture M^Shape, ^ d di on mc rties of the im tes , 

Ucatwn, and te^uljuery attributes i«d .as i Keywords. dfic ^ t of B data> calkd fcature data> for that ^vc. 

The abthty to select percs^ffial wejghtsJOfegjs a p ^ ^ [& ^ ^ icaU f n(s ^ ^ 

cn i^i!P!ti f th f_T al query over whidUhe-^as fcaturc ^ fa cxtracted b Qne ^ fcttm data 

(control. "Of couiserother attributes and ways selectmg a, , ^ ^ a mathematica i characterization of the visual 

Rights are encompassed by tbe present mvention. fcature. A feature vector is a concatenation of a set of feature 

guery esu s data elements corresponding to a set of primitives in a 

Referring to HO. 4 an exemplary screen disp ay 220 of schema (further described hereinb6 , ow) . ^ fcaturc vector 

Query Results will be descnbed. The Query Resets 220 are ferabl has Q6adcr information mat maps ^ feature data 

displayed to the user 102 by the thumbnail results browser 25 contained within it 

136 (FIG. 1A). A thumbnail (reduced size) image 222 of the ^ (he ^ ^ m0(Me m ^ utUized to ^ { 

query image B preferably shown in the upper left comer of ^ ^ database 132 ^ feahjre yector of ^ uted 

the visual display 148 (FIG. IB). A thumbnail 224 of the ye data ^ stored ^ a data structure 2fi4 , n ^ 

image that has the best similarity score, indicative of the ,. -j^ „ Mm • a u,.fr a , ^ \rm 

, ^ . . r . ' . . . - . the application provides a raw image butler to the VI R 

closest match to the query unage, is shown to the right of the 30 r • j«l r • *, • * * * r j * 

* , . .f«, * , . , . , Engine, and the Engine returns a pomter to a set of data 

query image 222. A thumbnail 226 of the image having the T - • *u ^ * j • -** j * -m. r *• 

1 . ■ m • L . contaming the extracted primitive data. The application is 

second best similarity score-is shown to the right of unage +u ui t * • j / j4 . 

„. , , » J , . , t * . . .1 th en responsible for storing and managing the data in a 

224, and so forth for a predetermined I number of thumbnail flshk)n ^ ^ £ ^ nQ &s m a 

images shown to the user 102 A mechanism (not shown)to ^ fasfai whfch means i( hag m of how the 

access a next screen of ranked thumbnails ,s available. Tbe 3 S ^ daU to MgMlilBd storedi or how me resuUs of 

similanty score of each of the ranked images may be rfes ^ m d ^ fc no transaction management at 

optionally shown in coniunction with the thumbnails. Of r . AriT , , ■ . . . 

v / J . .1 the Engine API level. This property means that system 

course, the present invention is not limited to the particular devel ^ need not about conflicls 

presentation of search result. be(ween ^ ^ En ^ e ^ Qther jj^^ compon ents 

Operational Flow of Base VIR System 40 such as databaseS) c ii ent . ser ver middleware, and so forth. 

Refernng to FIGS. SA and SB a high-level flow diagram p roce eding to state 260 of FIG. 5A, the feature vector of 

showing the operation 240 of the VIR system 100, including subm ( 

^r^^ R f E T e - v° WlU be , • ^ «B) I* Q««y P«>""or 261 obuins a candidate feature 

(HG 1A) preferably initiates query generation 242 by either vec , or for m . Mf . ftom featufe vectof ^ 2M ^ M 

unhzing the query ^canvas 108 to ^create a query, or browses 45 of database m) ^ feature vector of ^ 

110 the available file system to locate an existing object to (py^^) and the candidate feature vector (FVJ are then 

use as the query, or browses 246 the database store 132 (FIG. both ™S ted l0 m6 comparison module 12 V 

1A and FIG. 5B) to identify an image that has already been Comoarisons 

analyzed by the analysis module 122. In the last situation, if r™ , „ A „ T 

, / / , - r , , , r 4 * There are several ways to compare images using the API. 

the image is already in the database 132, a feature vector has 50 v „ h mo ,,w • lt -„„ _ r. , t;™;i»„-t., 

. 6 4 , %. , . j^./^^-r r* Each method involves computing one or more similarity 

been computed and is retrieved at state 247 from a teature a;***^** - ^.v n e ° f t-u„ 

^ . ... r . , , . A 4 . distances tor a pair or prmiitive vectors. The computation of 

vector storage porhon 264 of the database 132. A target ^ fe rfonned in ^ ^ First for 

image I r 248 results if eimer of the query canvas module 108 ^ such ^ local ^ 2?0 ]obal ^ 272 

or browse file system module 110 are used to generate a strucmre 2?4 Qr 2J6 a ^ (score) fc 

query. Hie target image 248 is input to the analysis module 55 similarity scores for primitives are further dis- 

122 to generate a feature vector for the target image as the CUfised m conjunction ^ nG u ^ scores (s } are 

output. Because of the importance of the primitives in the ^ at state 280 ^ wd fats ( x M2 b a 

system 100, a digression is now made to describe the base judiciously chosen fimction ^ forms 6 a fina f score . ^ 

system primitives. final combined score maVj for inst^ce^ be generated by a 

Default ^Primitives . 60 linear combination or a weighted sum as foUows: 

The Base VIR Engine 120 has a fixed or default set of & 
primitives. Primitives and their weights are identified and 

indicated using a tagging mechanism to identify them in the s I~Zu w & 

API calls. The default primitives of the presently preferred ' 

Base Engine are: 65 

Local Color (250): analyzes localized color and the spatial The final score is used to rank results 286 at state 284 by 

match-up of color between two images. similarity. An image 288 with the best score (the lowest 
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score in the presently preferred embodiment) is ranked at the 
closest match. Of course, the definition of "similarity" at this 
point is determined by the set of weights 282 used. 

Applications may also synthesize a property weighting 
(such as "composition") by intelligently applying weights 5 
during comparisons. If "composition" is weighted low, then 
global primitives should be emphasized; if it is weighted 
high, then local primitives should be emphasized. 

Decision state 290 determines if there are more images in 
the database 132 that need to be evaluated by the comparison 10 
module 124. If so, the Query Processor continues at state 
262 by obtaining the next candidate feature vector. If all the 
candidate images in the database 132 have been evaluated, 
processing advances to state 292 wherein the thumbnails 
corresponding to a predetermined number of ranked thumb- 15 
nails are retrieved from the image storage portion 266 of 
database 132 and are displayed to the user at state 294. 
Management 

There are several supporting functions that fall in the 
category of "management." These include initialization, 20 
allocation and de-allocation of weights and scores 
structures, and management of primitive vector data. 

III. THE EXTENSIBLE VI R ENGINE AND 

SYSTEM 25 

The Extensible VIR Engine introduces the notion of a 
"schema". A schema is a specific collection of primitives 
(default and/or application-specific) which are used in an 
application for the purpose of comparing images. When a 
group of primitives are registered, the system returns a 30 
schema ID to be used for future reference when creating 
weights and scores structures. 

Hie Extensible VIR Engine is an open, portable and 
extensible architecture to incorporate any domain specific 35 
information schema. The Extensible VIR Engine architec- 
ture can be extended not only across application domains, 
but across multiple media such as audio, video, and multi- 
dimensional information. 

The purpose of the Extensible Engine is to provide to the ^ 
application developer the flexibility of creating and adding 
custom-made primitives to the system. For example, a 
face-matching system might construct primitives called 
"LeftEye" and "RightEye", and provide an interface that 
compares faces based on the similarity of their eyes. 45 
Developer-Defined Primitives 

In terms of the VIR Engine, a collection of vectors 
representing a single category of image information is a 
primitive. A primitive is a semantically meaningful feature 
of an image. Thus color, texture, and shape are all general 50 
image primitives. Of course, not all primitives will be 
applicable across all images. For instance, a color primitive 
may have no relevance with respect to X-ray imagery. In 
practice, a primitive is specified by a developer as a 6-tuple 
of the following values: s5 
Static information 
primitive_Jd — a unique primitive identifier 
label — a category name for the primitive 
Data retrieval functions 

analysis_function — This function essentially accepts 60 
the image data and computes its visual feature data 
and stores it in a buffer. The function must accept an 
RGB image buffer, its attributes (height, width) and 
based on this information, perform any desired com- 
putation on the pixel data in the buffer. The results of 65 
this computation (i.e., feature computation) can be 
anything. Hie primitive decides what it wants to 
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return as the feature data. The feature data is returned 
by passing back a pointer to the data and a byte count 
telling the VIR Engine how much data is there. The 
Engine then takes the data and adds it to the vector 
being constructed. 
compare_function — This function returns the similar- 
ity score for its associated primitive. The query 
operations of the engine call this function with two 
data buffers (previously created with analysis__ 
function ( )) to be compared. The score which is 
returned is preferably in the range from [0.0 . . . 
100.0], wherein a "perfect* ' match returns a value of 
zero and a "worst" match returns a value of 100. The 
score is best considered to be a "distance" in "feature 
space". For maximum discrimination, the spectrum 
of distances returned for this primitive should be 
spread over this range evenly or in a reasonably 
smooth distribution. 
Data management functions 

swap_function — The engine takes full responsibility 
for handling the byte order difference between hard- 
ware platforms for easy portability. This allows data 
that is computed on a certain platform to be easily 
used on any other platform, regardless of byte-order 
differences. Each primitive supplies this function 
which will do the byte-order conversions of its own 
data. The engine will automatically use this function 
when necessary, to provide consistent performance 
across any platform. 
print_function — This function is used to print out the 
desired information of the associated primitive. 
After a primitive is defined, it is registered with the 
Extensible VIR Engine using the RegisterPrimitive( ) func- 
tion. Once registered, data associated with a custom primi- 
tive is managed in the visual feature structures in the same 
manner as the default primitives. From there, the new 
primitive can be incorporated into any schema definition by 
referencing the primitive..^ just like a built-in (default) 
primitive. Application developers may define any type of 
data structure^) to handle the data associated with their 
primitive. It is preferably required that the structure(s) can 
collapse into a BLOB to be passed back and forth via the 
registered procedures. In addition to the above primitive 
information, an estimated cost of comparison may also be 
supplied for the primitive, to aid in query optimization 
performed by the engine. 

In another implementation of the present inventive exten- 
sible search engine, a primitive may be defined in an 
object-oriented language such as, for example, C++. In an 
object-oriented language, an object is defined to include data 
and methods for operating on the data. One text for C++ 
programming, C++ Primer by Stanley Lippman, Second 
Edition, Addison -Wesley, is incorporated herein by refer- 
ence. 

Objects are created from classes defined by the author of 
an API. The base class may then be subclassed to provide a 
specific primitive, a color primitive for instance. The API 
author will then overload, say, a compare function and an 
analysis function. Thus, an extended primitive is added to 
the engine by object-oriented subclassing and function (or 
method) overloading. Such an embodiment will be under- 
stood by one of skill in the relevant field of technology. 

More specifically, abstract C++ classes using pure, virtual 
functions may define the interface. Furthermore, the object- 
oriented system implementation could follow the Object 
Management Group (OMG) standards. Presently, OMG is 
working on an Object Query Service standard which is 
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defined by Object Services Architecture (Revision 6.0), that are controlled independently from the weights interface 

which is incorporated by reference. Further information on of the VIR Engine. There is also ample opportunity for a set 

object-oriented database standards can be found in The of domain primitives to cooperate through shared data 

Object Database Standard: ODMG 93, edited by Cattell, structures and procedures (or objects) in such a way that they 

Morgan Kaufman Publishers, which is incorporated herein 5 can economize certain computations and information, 

by reference. primitives include a mechanism called "primitive 

Schema Definition extensions" for enriching the API. This allows the applica- 

Databases require a consistent structure termed a schema, don tcr COQtrol ovcr tfae behavior of the primitives and 

to organize and manage the information. As used herein, in ^ of comparisons . For example> a tcxturc primitivc 

particular, a schema is a specific collection or primitives. A * r • r l * c * * 

u i- a . * • * • 10 may expose a set of weights for sub-components ot texture 

specific schema implies a specific set of visual features to be *. • j- , , j ■ . 

processed and a corresponding feature vector to be used for s ™ h ™ P criodlcltv > rando mnC ss, roughness and orientation^ 

content-based similarity scoring. A VIR Engine schema is P«*mctcrs would be specialized and independent of 

defined as a 2-tuple: a schema id, and an ordered set of the mam texture weight passed through the Compare module 

primitives. Similar to primitives, the Extensible VIR Engine P om ^- . . . 

is notified of a new schema by a RegisterSchema() function. 15 Universal Primitives 

The primitive IDs referenced here must have previously S ^ T ^ um ^ al , or ^ fault P™ 1 ^ 5 are mclu ? cd 

been defined using RegisterPrimitive( ), or must be one of ™ {h thc B « Y IR Engine. Hicse pnmitives are umversal m 

the default primitives. The order in which the primitives are the ^ f ncodc s wh f^ are P™™ 1 in most 

referenced dictates the order in which their functions are ™ages, and useful m a wide class of domain-independent 

called during feature extraction (but not during query 20 W^ns. Each of these primitives are computed using 

processing). This allows primitives to work synergistically onl y the onginal data of the image. There is no manual 

and share computational results. A single application is f eryenUon squired to compute any of these primitives. A 

allowed to define and use multiple schemas. The Extensible Eloper can choose to mix-and-match these pnmitives in 

VIR Engine operates as a stateless machine and therefore junction with domain specific primitives to build an 

does not manage the data. Hence the calling application 25 apphcaUoa These primitives have been designed based on 

manages the storage and access of the primitive data com- e a ove euns 1CS * 

puted from any schema. The application developer must Global color— This primitive represents the distribution 

manage the schema_id that is returned from the registration. of colors within the entire image. This distribution also 

Preferably, the schema itself is expressed as a NULL- includes the amounts of each color in the image, 

terminated array of unsigned 32-bit integers, each contain- 30 However, there is no information representing the 

ing the ID of the desired primitive. The primitive IDs locations of the colors within the image, 

referenced here must have previously been defined using Local color — This primitive also represents the colors 

RegisterPrimitive, or must be one of the default primitives. which are present in the image, but unlike Global color, 

Primitive Design it emphasizes where in the image the colors exist. 

The "pistons" of the VIR Engine are the primitives. A 35 Structure— This primitive is used to capture the shapes 

primitive encompasses a given feature's representation, which appear in the image. Because of problems such 

extraction, and comparison function. There are a number of ^ lighting effects and occlusion, it relies heavily on 

heuristics which lead to effective primitive design. These shape characterization techniques, rather than local 

design constraints are not hard rules imposed by the Engine shape segmentation methods. 

architecture, but rather goals that lead to primitive which are 40 t^,^.^ primil i ve represents the low level textures 

"well-behaved". For a given application an engineer may and ^ withm mc ^ Unlike the stnicturc 

choose to intentionally relax certain constraints m order to pr i mitive , it * very sensitive to high-frequency features 

best accommodate the tradeoffs associated with that domain. within the image 

The constraints are as follows: ^ Domam Specific Primitives 

meaningful— Primitives should encode information Applications with relatively narrow image domains can 

which will be meaningful to the end-users of the rcgister domain specific primitives to improve the retrieval 

system. Primitives, in general, map to cogmtively rel- capa biuty of the system. For applications such as retinal 

evant image properties of the given domain. imaging, satellite imaging, wafer inspection, etc., the devel- 

compact— A primitive should be represented with the 50 opment of primitives that encode significant domain knowl- 

minimal amount of storage. ec jg e can result in powerful systems. Primitives should obey 

efficient in computation — Feature extraction should not the design constraints listed above, but there is considerable 

require an unreasonable amount of time or resources. flexibility in this. For example, a wafer inspection primitive 

efficient in comparison — Comparison of features should may be designed to look for a specific type of defect. Instead 

be extremely efficient. The formulation should take 55 of an actual distance being returned from the distance 

advantage of a threshold parameter (when available), function, it can return 0.0 if it detects the defect, and 100.0 

and avoid extraneous processing once this threshold if not. 

has been exceeded. The distance function should return Analysis 

results with a meaningfully dynamic range. Before an application can determine the similarity 

accurate — The computed data and the associated similar- eo between an image description and a set of candidate images, 

ity metric must give reasonable and expected results for the images must be analyzed by the engine. The resulting 

comparisons. feature data is returned to the caller to be used in subsequent 

indexable— The primitive should be indexable. A second- operations. Naturally, if an image is to be a candidate image 

ary data structure should be able to use some associated in future operations, the feature vector should be stored in a 

value(s) for efficient access to the desired data. 65 persistent manner, to avoid re-analyzing the image. 

In addition, primitives can provide their own "back door" analyze^ image — This function accepts a memory buffer 

API's to the application developer, and expose parameters containing the original image data. It performs an 
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analysis on the image by invoking the analysis func- 
tions of each primitive. The results of this computation 
are placed in memory and returned to the caller, along 
with the size of the data. Maintenance and persistent 
storage of this data is the caller's responsibility. 
Eventually, these structures are passed into the image 
comparison entry points. 
destroy_features — This function is used to free the 
memory associated with a visual feature that was 
previously returned from analyze_image( ). Typically, 
this is called after the application has stored the data 
using the associated persistent storage mechanism. 
Similarity/Scores 

Any image retrieval application requires the ability to 
determine the similarity between the query description and 
any of the candidate images. The application can then 
display the computed similarity value of all of the candidate 
images, or convey only the most similar images to the user. 
To do this, similarity scores are computed by the engine for 
the relevant candidate images. An application will call the 
comparison functions provided by the engine. These func- 
tions will return a score structure, which indicates the 
similarity between the images being compared. The score 
structure contains an overall numerical value for the simi- 
larity of the two images, as well as a numerical value for 25 
each of the primitives in the current schema. This allows 
applications to use the values of the individual primitive 
comparisons, if necessary. 

When two images are compared by the engine, each of the 
primitives in the current schema are compared to give 30 
individual similarity values for that primitive type. Each of 
these scores must then be used to provide an overall score 
for the comparison. In certain situations, these individual 
primitive scores may need to be combined differently, 
depending on the desired results. By altering the ways these 
individual scores are combined, the application developer 
has the ability to indicate relative importance between the 
various primitives. For example, at times the color distri- 
bution of an image will be much more important than its 
texture characteristics. There may also be cases where only 
some of the available primitives are required in order to 
determine which images should be considered the most 
similar. 
Weights 

Applications are given flexibility in how the overall score 
is computed through use of a weights structure. The weights 
structure includes a weight for each primitive. The applica- 
tion has control over the weight values for any given 
comparison through the weights structure, and the following 
functions. 

create_weights — This function is used to allocate a 
weights structure for use in the compare functions. The 
associated schema_id will determine the specific for- 
mat of the structure. 

destroy -weights — This function is used to free the 
memory previously allocated with create_weights( ). 

set__weight — This function sets the weight in the weights 
structure identified by the given primitive_id, which 
identifies the primitive whose weight is to be set. The 
value should be a positive floating point number. In 
general, weights are normalized before use by calling 
normalize_weights( ). 

get__weights — This function is used to extract an indi- 
vidual weight value from a weights structure. 
Note that other interesting visual parameters may be sur- 
faced in a user interface by combining the weights of the 
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primitives in intelligent ways. For example, a visual quantity 

called "Composition" may be synthesized by controlling the 

relative weighting of the color primitives. 
Two examples of utilizing weights with the primitives by 

use of the weights sliders (e.g., 208) in the query window 

200 (FIG. 3) are as follows: 
Texture: The VIR Engine evaluates pattern variations 
within narrow sample regions to determine a texture 
value. It evaluates granularity, roughness, 
repetitiveness, and so on. Pictures with strong textural 
attributes — a sandstone background for example — tend 
to be hard to catalog with keywords. A visual search is 
the best way to locate images of these types. For best 
results, a user should set Texture high when the query 
image is a rough or grainy background image and low 
if the query image has a central subject in sharp focus 
or can be classified as animation or clip-art. 
Structure: The VIR Engine evaluates the boundary char- 
acteristics of distinct shapes to determine a structure 
value. It evaluates information from both organic 
(photographic) and vector sources (animation and clip 
art) and can extrapolate shapes partially obscured. 
Polka dots, for example, have a strong structural ele- 
ment. For best results, a user should set Structure high 
when the objects in the query image have clearly 
defined edges and low if the query image contains 
fuzzy shapes that gradually blend from one to another. 

Comparison 

To get the result of an image comparison, the application 
supplies the precomputed primitive vectors from two 
images, together with a set of weights to a first API called 
Compare. The system fills in a score data structure and 
returns a pointer to the caller. A second API called Compa- 
re IntoScores caches the primitive component scores for later 
use. A function RefreshScores can efficiently recompute a 
new score for a different set of weights (but the same query 
image, i.e., a re-query). This second API call takes a score 
structure and a weights structure, and recomputes a final 
score (ranking) without needing to recompute the individual 
primitive similarities. A third API call (ThresholdCompare) 
is an extension of the first, in that the user also supplies a 
threshold value for the score. Any image having a distance 
greater than this value is considered non-qualifying, which 
can result in significant performance gains since it will 
probably not be necessary to compute similarity for all 
primitives. 

Every application may have unique requirements in the 
way the application determines which images are to be 
considered most similar, and how to efficiently manage a 
changing set of results. Certain applications may need to do 
an exhaustive comparison of all images in the candidate set 
while others are only "interested" in a certain set which are 
most similar to the query description. Certain applications 
(or situations) may also require the ability to quickly 
manipulate the relative importance of the primitives, using 
the individual primitive scores and weights, as discussed 
above. In another embodiment of the present engine, com- 
parison functions may be structured as follows: 

compare — Ihis is the simplest entry point for computing 
the overall visual similarity for two given images, 
represented by their respective visual features. The 
caller passes in a weights structure and two feature 
vectors, and compare( ) computes and returns the 
weighted overall score, which is a numerical value 
preferably in the range [0.0 .. . 100.0]. This function 
can be used when a score is required for every candi- 
date image. If only the top N scores are required, the 
function threshold_compare( ) may be more appropri- 
ate. 
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heterogeneous_compare — This is a variation of the stan- to help develop Dew primitives and assess their cost. It 

dard compare described above, wherein the schemas constructs a schema with only the Global Color primitive as 

for each of the two images have the same primitives. In a timing baseline. The application developer then can con- 

the heterogeneous compare, each of the two images struct a schema with only the new primitive to establish its 

may have been analyzed by use of a different schema. 5 cost relative to the Global Color primitive. 

For example, a feature vector for image A is based on If mc cost vahie for a new primitive is unknown, or if its 

a different set of primitives than a feature vector for exccu tion time varies widely depending on the image that is 

mia S e B. being analyzed, then it is best to estimate the cost, or use the 

threshold_compare — This function can be used for opti- value 1 0 

mized searches in which the scores of every single 1Q Flowchart and Architecture Descriptions 

candidate image are not required A threshold similarity Referring to FIG. 6 the components of the extensible VIR 

distance is passed in to mdicate that any image whose ^ ^ bc describcd ^ previous i v described abovc , the 

score is above the threshold is not of interest for this ^ J « C - ^ 0 f course, other 

search. As soon as the engine determines that the image r . , r , j* *u ai« tu . ui 

.. . ( f . . . . t 6 computer languages can be used for the API. The extensible 

is outside this range, it terminates the similarity com- _ WTD Jt nn m • . „ m _ 

. ^ a . . j. „ tl _ t tl _ ' . , , 15 VIR engine 300 includes three main components: am ana- 

putauon and returns a flag to indicate that die threshold , ^ a c ator 304 and a ^ yc re Oration 

has been exceeded. This provides a significant perfor- / (erfacc m ^ ^ 302 fa similar me fflalysis 

mance boost when top N style searches are sufficient module m ^ ator 304 fa similar l0 me ^ 

Top N quenes will be described in conjuncUon with ris0D modlll6 ,24, previously ^ in FIG . 1A. The 

FIG. 14. Again, ,t is the apphcaton s responsibility to 2Q ^ m has fln interface 30g tQ OTmnranicate 

determine the appropnate threshold value for each ^ exterQal components . The ana]yze interface 308 

comparison. receives a RGB format image as input 314 and generates a 

Query Optimization feature vector as output 316. The comparator 304 has two 

A final aspect of the Extensible Engine is the notion of ^ a ^ ^ ^ 31 „ ^ a ^ 

query optimization. Each primitive provides a similarity 3U ^ wei ^ te and scQres iDterface 31Q cQm . 

function to the Engine. During the Uiresbold compare municates ^ a mana | emeQt Mon 318 hand]ed b the 

operation the Engine attempts to visit the pnmitives m an alicalion . ^ ^ mterface 312 receives ^ feature 

order such that it can determine as cheaply as possible if the ^ a vector 32(( ^ a feature vector 

companson score will exceed the passed-in threshold. As ^ fof ^ current tested of red ^ 

soon as it is exceeded, the rest of the primitive comparisons x ^ ^ ^ extensible vj R en ^ n6 300 are a set of 

are aborted. Two main factors play into the query optimi- rimitives A devel can $ , se( of nmitives mat 

zation scheme: the weighting associated with that primitive are bg ^ fof a ^ { domai[j The 

and the cost of executing the companson operation for tha ^ ine 300 ^ Q{ iM ^ 

pnmiUve. Application developers can tell the Engine what ^ ^ ^ 33 ^ ^ ^ 334 ^ 

the cost of their prmiitive's similarity function is during the 3J 33( - ^ devel choose to use one or 

registration process. Developers that construct their own niunber of ^ Qr fof ^ application . 

pnmitives can help the optimizer by providing accurate cost ad ^ deyel ma deflne one Qr mofe 

information for then" custom Compare function. The follow- rimiUves ^ f (he primiUves ^ the primiUve 

mg desenpuon explains how to determine the cost of the M on interface 306. The process of registering new 

custom Compare function for the new primitive. 40 cultom primitives will be further described hereinbelow. 

The cost value is a positive number which cannot be 0.0 Refcrrin nQw tQ p]G ? ^ , ^ ^ 

If the apphcation uses all custom primitives, then be actual ^ extensible me 300 wi]1 be described . 

values of these costs are not important. They should merely ^ extensible ^ me 300 communicates ^ , he ^ 

bC ?nf V ,n^ C ° l ^ Ct u Valll6S ° f ' ; ? ar< ^ f" 1116 102 trough a user interface 350. The user interface 350 may 

as 100, 200, 300. However if the apphcation developer 4S mclude m B odules sucb ^ the Q uery Canvas module 118 and 

wishes to integrate some custom pnmitives with the default Browsi m wMch wefe eviousl 

primitives previously desenbed, then the cost values must be amim ^ m ^ FIG. 1A. The extensible VIR 

calibrated with respect to the cost values for the default engine m ^ ^ fa communication with persisteDl storage 

pnmitives. 132 through a database interface 130. The database interface 

In one presenUy preferred embodiment the nominal base- JQ ^ ^ ,» ' a e ^ such ag iousl 

hne for computation .cost may be arbitrarily set by defining desaib J\ bov l ^ applica a 0 n developer has complete 

U»t toe VIR_GLOBAL COLOR primitive has a cost of freedom ^ ^ faleifice ^ 

1.0. On this scale, the default pnmiUves have the following m (o ^ ^ nee(Js Qf ^ dCTllar domajn a , 

cos,s: issue. 

55 Referring to FIG. 8, an operational flow 360 of the 

oiob»] Color 1.00 extensible VIR engine 300 will now be described. The 

Local Color 2.20 engine flow 360 is invoked by an application such as the 

Texture 4.10 example shown in FIG. 7. Beginning at a start state 362, the 

structure 2 30 engine moves to process 364 to register one or more 

^ ~ ^ 60 primitives through the primitive registration interface 306 

To calibrate a custom primitive against this cost scale, (FIG. 6). Process 364 will be further described in conjunc- 

some empirical experiments must be performed and the tion with FIG. 13. In typical operation of the extensible VIR 

execution of the new procedures timed relative to the time engine 300, the user will provide a query object, such as 

taken by the Global Color primitive. This ratio is the cost through use of the Query Canvas 108 (FIG. 5A) or by 

value that should be passed to the primitive registration 65 browsing the file system 110 to identify the query object, 

procedure. A skeleton benchmark application is provided as Moving to a run analyzer process 366, a query object is 

an example with the Extensible Engine API that can be used analyzed by the analyzer 302 (FIG. 6) to create a feature 
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vector for the query image. Proceeding to state 368, the user 
typically provides or sets weights through the user interface 
350 (FIG. 7). Moving to a run comparison process 370, the 
comparator 302 (FIG. 6) determines a similarity score for 
the two feature vectors that are passed to it. The compare 5 
operation is typically performed on all the images in the 
database 132 unless a database partition has been identified 
or another scheme to compare or test only a portion of the 
images in database 132 is established. Once all the images 
have been compared by the run comparison process 370, the 10 
engine moves to end state 372 and control returns to the 
calling application. 

Referring to FIG. 9, another embodiment of a VIR system 
utilizing the extensible VIR engine 300 will now be 
described. As previously described in conjunction with FIG. 15 
5A, several methods of generating a query have been shown. 
One of these methods includes the query generation and 
Query Canvas method 242/108, whereby the user draws or 
sketches a query image or modifies an existing image. 
Alternatively, the user may browse the file system 390 to 20 
identify an object or image to be used as the query 314. The 
query object 314 is passed onto the analyzer 302 for analysis 
to generate a feature vector 316 for the query. The feature 
vector 316 is sent to the database engine 130. Generally, the 
feature vector for the query image is only needed tempo- 25 
rarily to process the query. The query feature vector is 
usually cached in random access memory (RAM) associated 
with the database engine 130, for the query operation. For 
some database implementations, the query feature vector is 
placed in a temporary table by the database engine 130. 30 

A feature vector for the query target 320 and a feature 
vector 322 for one of the images in the database store 132 
are retrieved by the database engine 130 and sent to the 
comparator 304 for comparison. At the comparator 304, a 
thresholding decision 394 is checked to determine if thresh- 35 
olding is to be applied to the comparison method. If not, a 
standard comparison 396 will be performed utilizing the 
weights 400 as set by the user 102 (FIG. 1A). The standard 
comparison 396 will be further described in conjunction 
with FIG. 11. If thresholding is desired, the comparison will 40 
be performed by the threshold comparison process 398 also 
utilizing the weights 400. The threshold comparison 398 will 
be further described in conjunction with FIG. 12. A simi- 
larity score 324 is output by either the threshold comparison 
398 or the standard comparison 396. The similarity score 45 
324 is utilized by the calling application for use in presenting 
the resultant images. Presentation may be putting thumb- 
nails in a ranked order, for example. 

Referring to FIG. 10, the analysis performed by the run 
analyzer process 366 (FIG. 8) will now be described. Recall 50 
that a schema is a collection of primitives defined by a 
developer or application programmer. These primitives may 
include some or all of the universal primitives built into the 
VIR engine and any custom primitives defined by the 
developer for a schema. Also recall that each custom primi- 55 
tive must have an analysis function and a comparison 
function, and the primitive is registered through the primi- 
tive registration interface 306 (FIG. 6). These functions 
along with the analysis and comparison functions for the 
universal primitives are all stored in a lookup table for the 60 
schema. 

The process 366 takes as input an image and provides as 
output a feature vector Beginning at a start analysis state 
410, the analysis process 366 moves to a state 412 to 
construct a header for the feature vector. A schema ID for the 65 
object or image that is to be analyzed is an input to the 
construct header state 412. The schema ID is obtained from 
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the schema creation process described in conjunction with 
FIG. 13. The user identifies the schema to be used for 
analysis of the visual objects through the application pro- 
gram. Using the schema ID, the corresponding schema or 
lookup table structure is accessed which lists the respective 
primitives and functions. There is one individual lookup 
table per schema. Accessing the first primitive in the lookup 
table for the schema at state 414, the analysis process 366 
proceeds to state 416 and looks up the analysis function for 
that primitive in the schema lookup table. Proceeding to 
state 418, the analysis function for the current primitive is 
called and the analysis function is performed. The input to 
the analysis function at state 418 is the image to be analyzed 
including its height and width characteristics. The output of 
state 418 is the feature data for the current primitive which 
is placed in the feature vector under construction. Any of 
various statistical techniques are used in the analysis func- 
tion for the current primitive. For example, histogramming 
could be used, such as a color histogram. As another 
example, a mean intensity primitive could be defined as the 
sum of the intensity of all the pixels in an image divided by 
the number of pixels in the image. 

These techniques are well-known by those skilled in the 
relevant technology. Proceeding to decision state 420, the 
analysis process 366 determines if there are additional 
primitives in the current schema that need to be processed. 
If so, the analysis process 366 moves back to state 414 to 
access the next primitive in the current schema. If all the 
primitives in the current schema have been processed, the 
analysis process proceeds to state 422 to finalize the feature 
vector for the current image, At state 422, the analysis 
process 366 computes the total resulting size of the feature 
data and updates the size in the header for the feature vector. 
In another embodiment, checksums are also computed at 
state 422. The complete feature vector contains the header 
information and the feature data for each of the primitives in 
the schema. The analysis process 366 completes at a done 
state 424. 

Referring now to FIG. 11, the standard comparison pro- 
cess 396 shown in FIG. 9 will be described. In a manner 
similar to the analysis process 366 previously described, a 
comparison function for each custom primitive must be 
registered through the primitive registration interface 306 
(FIG. 6). The registered comparison functions are stored in 
the schema lookup table. The input utilized by the standard 
comparison process 396 includes two feature vectors to be 
compared and weights for each primitive. If the primitives 
for each of the two feature vectors are the same, the standard 
comparison is considered to be a homogeneous comparison. 
However, if each of the two feature vectors is associated 
with a different schema, but has at least one primitive in 
common between the two feature vectors, the comparison is 
considered to be a heterogeneous comparison. As will be 
seen below, the standard comparison process 396 accom- 
plishes either type of comparison. 

Beginning at a start comparison state 440, the comparison 
process 396 moves to a state 442 to construct a score 
structure for the comparison. The score structure is initial- 
ized to be an empty score structure at this point. The score 
structure contains space for one score per primitive plus an 
overall score. Proceeding to state 446, the comparison 
process 396 accesses a primitive in feature vector 1 (FV1), 
which is associated with the first of the two images being 
compared by the comparison process. For instance, FV1 
may be the result of analyzing the target image. Moving to 
a decision state 448, the comparison process 396 determines 
if the primitive accessed in state 446 exists in feature vector 
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2 (FV2), which is associated with the second of the two mance benefits to be gained by exploiting the primitive 

images being compared. FV2, may, for instance, correspond architecture of the VIR engine to intelligently process the 

to a candidate image. If the same primitive does exist in comparison. Comparisons proceed by computing the primi- 

feature vector 2, the comparison process 396 proceeds to tive comparison scores for the most heavily weighted primi- 

state 450 to look up the comparison function, for the current 5 tives first, and trying to prove as soon as possible that the 

primitive in the schema lookup table for FV1. Continuing at threshold has been exceeded. If the threshold is exceeded, 

state 452, the feature data associated with the current primi- the rest of the primitive comparisons are then aborted, 

tive from both feature vector 1 and feature vector 2 is Similar to the standard comparison process 396, previously 

unpacked. Recall that each feature vector is a concatenation described, two feature vectors and corresponding weights 

of feature data elements corresponding to the set of primi- 10 are input to the threshold comparison process. An additional 

tives in the schema. Advancing to state 454, the compare input is a threshold value, preferably in the range of 0 to 100. 

function accessed at state 450 is invoked and receives the The threshold comparison process 398 also performs both 

feature data unpacked at state 452. The result of calling and homogeneous compares and heterogeneous compares (as 

executing the compare function at state 454 is a primitive done by the Standard Compare). The threshold comparison 

score. An exemplary primitive having only one dimension or 15 process 398 can be performed on both the Base VIR Engine 

property is mean intensity. In this example, the distance or and the Extensible VIR Engine. However, the Base VIR 

primitive score between feature vector 1 and feature vector Engine may also perform a heterogeneous compare. In one 

2 could be (XI -X2). For primitives having multiple preferred embodiment, a heterogeneous compare can be 

dimensions, such as texture which may have as many as 35 performed only if at least one of the schemas utilizes a 

dimensions, the presently preferred embodiment uses a 20 subset of the default primitives. 

Euclidean metric. An equation for an exemplary Euclidean Beginning at a start comparison state 480, the threshold 

metric is as follows: comparison process 398 proceeds to state 482 to construct a 

score structure for the comparison. The score structure is 

) 1/2 initialized to be empty. Continuing at a state 484, the 
25 primitives of feature vector 1 (FV1), in the presently pre- 
ferred embodiment, are ordered by weights, with the highest 
weighted primitive ordered first and the lowest weighted 

Other techniques to determine the primitive score, such as primitive ordered last. A cost is optionally associated with 

histogram intersection or other histogram techniques, may each primitive to further order the primitives. The costs were 

be used in other embodiments. 30 previously described in the query optimization description. 

Moving to state 456, the primitive score or feature score ^ cost value and ^ wei S ht cao combined by a 

is placed into the score structure constructed at state 442 developer-defined function to order the primitives. For 

above. Continuing at a decision state 458, the comparison example, the function could be multiplication. As another 

process 396 determines if there are additional primitives in example, if the costs are normalized to [0 ... 1] beforehand, 

feature vector 1 that need to be processed. If so, the 35 a Maximum function can be used as follows: Max((1.0- 

comparison process 396 moves back to state 446 to access ^O. weight). In another embodiment, only the costs are 

the next primitive in feature vector 1. A loop of states 446 used to order primitives. 

through 458 is performed until all primitives in feature Proceeding to state 486, the highest weighted primitive in 

vector 1 have been processed. When decision state 458 feature vector 1 is accessed. Subsequent states 488 through 

determines that all primitives have been processed in feature 40 496 m similar t0 states 44S 456 of the standard 

vector 1, comparison process 396 proceeds to state 460 comparison process 396 shown in FIG. 11, and thus will not 

wherein the scores stored in the score structure are combined be described in detail here. If the pnmitives of the two 

with the weights 400 (FIG. 9) for each of the primitives feature veclors are m common, the comparison function for 

passed into the comparison process to generate a final ^e primitive is called (state 494) and the primitive score is 

combined score. The final combined score may be generated 45 computed and stored in the score structure at state 496. 

by a linear combination or a weighted sum: Moving to state 498, a partial final score is computed using 

the weights and the scores stored in the score structure so far. 

„ Moving to a decision state 500, the threshold comparison 

S L - Zj w ' Ji process 398 determines if the partial final score, also known 

so as a weighted primitive score, exceeds the threshold passed 
into the comparison process 398. If the threshold has not 

The comparison process 396 completes at a done state 462. been exceeded, as determined at decision state 500, the 

Returning to decision state 448, if the current primitive comparison process 398 continues at a decision state 502 to 

that is accessed in feature vector 1 at state 446 does not exist determine if there are additional primitives to be processed, 

in feature vector 2, comparison process 396 moves down to 55 If there are additional primitives to be processed, threshold 

decision state 458 to determine if additional primitives exist comparison process 398 moves back to state 486 to access 

in feature vector 1, thereby bypassing calling the compare the next highest ordered primitive in feature vector 1. A loop 

function for the current primitive of feature vector 1. This of states 486 through 502 continues; until all primitives in 

allows feature vectors from different schemas to be com- feature vector 1 are processed unless the threshold has been 

pared but the comparison is only on primitives that are in 60 exceeded as determined at decision state 500. If the thresh* 

common between the feature vectors. If all the primitives old has been exceeded at decision state 500, the threshold 

between the two feature vectors are in common, the com- comparison process 398 aborts the loop, moves to done state 

parison will be done for each of the primitives and is a 506 and returns with an indication that the threshold has 

homogeneous comparison. been exceeded. 

Referring to FIG. 12, the threshold comparison process 65 Returning to decision state 502, if all primitives in feature 

398 previously shown in FIG. 9 will now be described. The vector 1 have been processed, threshold comparison process 

threshold based comparison 398 allows significant perfor- 398 moves to state 504 to determine a final combined score. 
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State 504 is optional if the score from state 498 has been code from the threshold compare process 398 is "normal" 

saved. If the score has not been saved, the final score is (ok). If so, query process 550 proceeds to a decision state 

computed using the scores stored in the score structure and 562 to determine if the number of results so far is less than 

the weights. The threshold comparison process 398 returns the desired number of results (C<N). If so, query process 

with a normal indication at the completion of state 504 and 5 550 moves to state 564 to add the score S ( - returned from the 

completes at the done state 506. threshold compare process 398 to the query results list in an 

Referring to FIG. 13, a schema creation and primitive order that is sorted by score. The number of entries in the 

registration process 520 will be described. This logic is sorted results list thereby increases by one and has 

executed by the application. A developer may typically entries. Moving to state 566, query process 550 increments 

create a new schema for a certain domain of objects or 10 the result count C by one. Proceeding to a decision state 568, 

images. Examples of domains where new schemas may be the query process 550 determines if the number of results so 

created include face recognition, mammography, ophthal- far is equal to the desired number of results (C=N). If so, the 

mological images and so forth. As previously described, query process 550 advances to state 570 wherein threshold 

each custom primitive requires a primitive ID, a label, an T is set equal to the score (score^) of the N th (last) result in 

analysis function, a compare function, a swap (endian) 15 the sorted results list. The query process 550 continues at a 

function and a print function. This process 520 is a portion decision state 580 to determine if there are additional objects 

of the primitive registration interface 306 (FIG. 6). having feature vectors (FV ( ) in the database 132. If so, query 

Beginning at a start state 522, the schema creation process process 550 moves back to state 556 to access the next 

520 proceeds to state 524 to create a new schema. Creating feature vector in the database store 132. A loop of states 

a new schema is a function of the extensible VIR engine 300. 20 556-580 is executed until all the feature vectors in the 

The output of state 524 is a schema ID which allows the database store 132 have been processed, at which time the 

registered primitives to be identified. The results of state 524 query process 550 is finished at a done state 582. 

also include an empty schema structure, which includes the Returning to the decision state 568, if the value of C does 

schema ID. Moving to state 526, a primitive desired for this not equal the value of N, the query process 550 proceeds to 

schema is added to the schema structure. Adding the primi- 25 the decision state 580 to determine if there are additional 

tive to the schema is a function of the extensible VIR engine feature vectors to process, as previously described. The 

300. Moving to a decision state 528, the schema creation threshold T is not changed in this situation, 

process 520 determines if another primitive is to be added to Returning to the decision state 562, if the value C is not 

the current schema If so, process 520 moves back to state less than the value of N (i.e., C-N), the query process 550 

526 to add the next desired primitive to the schema. When 30 continues at a decision state 572. At decision state 572, a 

all desired primitives have been added to the schema as determination is made as to whether the score S, returned 

determined at decision state 528, schema creation process from the threshold compare process 398 is less than thresh - 

520 completes at a done state 530. At this point, a final old T (which is either the initialization value of 100 or the 

schema table identified by the schema ID and including all score of result N of the sorted results list set by either state 

the desired primitives has been created. The desired primi- 35 570 or state 578 in a prior pass of the process 550). If not, 

lives may include any custom primitives or any of the (i.e., S ( - is equal to or greater than T) query process 550 

default or standard primitives, such as global color, provided proceeds to the decision state 580 to determine if there are 

in a library. additional feature vectors to process, as previously 

Referring to FIG. 14, the top "N" query process 550 will described. However, if the score S, is less than T, as 

now be described. The top N query is an exemplary usage of 40 determined at decision state 572, the query process 550 

the threshold comparison 398 by an application to provide a proceeds to state 574 wherein the new result score S, is 

performance gain. The top N query process 550 is used in a inserted into the results list sorted by score. At this time, the 

search where a fixed number of results "N" is desired and N results list temporarily has N+l entries. Advancing to state 

is known beforehand, e.g., N is provided by the application 576, the query process 550 deletes the last result (N+l) in the 

program. When N is small compared to the size of the 45 sorted results list. Moving to state 578, the query process 

database to be searched, the use of the threshold comparison 550 sets threshold T equal to the score (score^) of the new 

398 can result in a significant increase in speed of process- N r * (last) result in the sorted results list. The query process 

ing. The inputs to this process 550 are the query target object 550 continues at the decision state 580 to determine if there 

to be searched against represented by its feature vector are additional objects having feature vectors (FV,) in the 

FV 'jaxget* tne weights for the primitives in this feature 50 database 132, as previously described, 

vector, and the desired number of results "N". Returning to the decision state 560, if the return code from 

Beginning at a start state 552, query process 550 moves the threshold compare process 398 is "threshold exceeded", 

to state 554 wherein initialization is performed: a query the score for the current feature vector is ignored and the 

results list is cleared to an empty state, a threshold variable query process 550 proceeds to the decision state 580 to 

"T" is set to be 100 (the maximum value of the preferred 55 determine if there are additional feature vectors to process, 

range [0 . . . 100]), and a result count variable "C" (the as previously described. 

number of results so far) is set to zero. The count C will be The output of the query process 550 is the sorted results 

in the range 0^ C^N. Proceeding to state 556, query process of the top N feature vectors. This output is sorted by score. 

550 accesses the feature vector FV,- for the first object in the N 

database store 132 (FIG. 9). The query process 550 then eo 1V A*TL1CA11UINJ> 

calls the threshold compare process 398 (FIG. 12) which is The VIR Engine directly implements the Visual Informa- 

a function of both the extensible VIR Engine 300 and Base lion Model previously described and acts as the hub around 

VIR engine 120. The feature vectors for the target object which all specific applications are constructed. The Engine 

(F^ 1 target) anc * the current object (FV,) (from state 556) serves as a central visual information retrieval service that 

along with the primitive weights and the threshold T are all 65 fits into a wide range of products and applications. The 

passed in to the threshold process 398. Moving to a decision Engine has been designed to allow easy development of both 

state 560, the query process 550 determines if the return horizontal and vertical applications. 
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Vertical Applications 

Because the facility of content-based image retrieval is 
generic, there is a large potential for developing the VIR 
technology in several vertical application areas, such as: 

digital studio 

document management for offices 
digital libraries 
electronic publishing 

face matching for law enforcement agencies 
medical and pharmaceutical information systems 
environmental image analysis 
on-line shopping 
design trademark searching 
internet publishing and searching 
remotely sensed image management for defense 
image and video asset management systems 
visual test and inspection systems 
To explain why the VIR technology is a central element 
in these applications, let us consider some application pos- 
sibilities in detail. 
Environmental Imaging 

Environmental scientists deal with a very large number of 
images. Agencies such as NASA produce numerous satellite 
images containing environmental information. As a specific 
example, the San Diego Bay Environmental Data Reposi- 
tory is geared towards an . . . 
"... understanding of the complex physical, biological 
and chemical processes at work in the Bay ... it is 
possible to correlate these different kinds of data in both 
space and time and to present the data in a visual form 
resulting in a more complete picture of what is and 
what is not known about the Bay .... This is the kind 
of information that is required to assist decision makers 
in allocating scarce resources in more effective and 
informative monitoring programs by sharing data, 
eliminating redundant monitoring and reallocating 
resources to more useful and effective purposes. 
Another key component of this work is to provide all of 
these data and resultant analyses to the public-at-large 
. . . through the World-Wide-Web of the Internet/' 
(From the San Diego Bay Project home page) 
For such applications, the methods are applicable to any 
geographic area in the world. Many of the datasets for 
environmental information are in the form of directly cap- 
tured or computer-rendered images, which depict natural 
(mostly geological) processes, their spatial distribution, and 
time progression of measurands. It is a common practice for 
environmental scientists to search for similar conditions 
around the globe, which amounts to searching for similar 
images. 
Medical 

A significant amount of effort is being spent in nation- 
wide health care programs for early detection of cancer. 
Image comparison is one of the fundamental methods for 
detecting suspicious regions in a medical image. 
Specifically, consider a cancer-screening center where a 
large number of fine needle aspiration cytology (FNAC) 
tests are conducted daily for breast cancer. We can envision 
a system that uses the system's image -similarity techniques 
to provide an intelligent screening aid for the practicing 
cytologist. After the slide is prepared, it is scanned by a 
camera-equipped microscope at different levels of magnifi- 
cation. At each magnification level, the slide is compared to 
a database of other slides (or an existing pre -annotated atlas) 
at the same magnification, and similarity is computed in 
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terms of cell density, number of nuclei, shapes of nuclei, and 
number of dividing cells. Suspicious regions of the slide are 
presented to the cytologist for closer inspection. If nothing 
suspicious is found, the system might suggest skipping the 

5 next higher level of magnification. The cytologist could 
always override the suggestion, but in general, it would save 
the cytologist the tedium of scanning through the entire 
slide, and thus increase his or her productivity. 

io Multimedia 

Digital libraries of videos are becoming common due to 
the large number of sports, news, and entertainment videos 
produced daily. Searching capabilities for a video library 

15 should allow queries such as "show other videos having 
sequences like this one." If the query sequence has a car 
chase in it, the system should retrieve all videos with similar 
scenes and make them available to the user for replay. The 
basic technology to achieve this relies on detection of edit 

20 points (cuts, fade-ins, and dissolve), camera movements 
(pan and zoom), and characterizing a segmented sub- 
sequence in terms of its motion properties. Also needed is a 
smooth integration with a database system containing tex- 

^ tual information (such as the cast, director, and shooting 
locations), and other library facilities for which software 
products already exist. 

V. APPLICATION DEVELOPMENT 

30 

A present embodiment of the VIR Engine is delivered as 
a statically or dynamically linkable library for a wide variety 
of platforms (such as Sun, SGI, Windows, and Apple 
Macintosh). The library is database independent and con- 

3S tains purely algorithmic code with no dependencies on file 
systems, I/O mechanisms, or operating systems. The engine 
does not impose a constraint on the mechanism used to 
persistently store the image features. An application could 

4Q manage the data using a relational database, an object- 
oriented database, or a simple file system approach. In this 
way, the VIR Engine is highly portable, and can be consid- 
ered for specialized processors and embedded applications. 
FIG. 7 shows the interaction between the Engine and other 

45 components of an end-user application. 

The VIR Engine is intended as an infrastructure around 
which applications may be developed. Image management, 
thumbnails, database interfaces, and user interfaces are the 
50 responsibility of the application developer. In particular, 
persistent storage of feature vectors is up to the application. 

The VIR architecture has been designed to support both 
static images and video in a unified paradigm. The infra- 
structure provided by the VIR Engine can be utilized to 
address high-level problems as well, such as automatic, 
unsupervised keyword assignment, or image classification. 

While the above detailed description has shown, 
described, and pointed out the fundamental novel features of 
60 the invention as applied to various embodiments, it will be 
understood that various omissions and substitutions and 
changes in the form and details of the system illustrated may 
be made by those skilled in the art, without departing from 
the intent of the invention. 

65 

A sample application template (example program) is 
provided as follows: 
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* example program 

* Description: Example program 

* This simple program exercises typical entry points in the Virage Image 
Engine API. 

* In particular, we illustrate: 

* - Creating an Image Engine 

* • Creating a default schema 

* - Creating a media object from an array of pixels 

* - Analyzing a media object to create a feature vector 

* - Creating and setting a weights structure 

* - Comparing two feature vectors to produce a distance 

* - Proper destruction of the above objects 

* Copyright (c) 1996 Virage, Inc. 



#includc <stdlib.h> 
#include <stdio.h> 

tfifndef VIR_ENG_ENGINE_C_H 
#include <eng_engine_c.h> 
#endif 

#imdef VIR_VIRCORE_H 
#include <vircorc_c.h> 
rftendif 

fWfndef VIR_[MG_IO_C_H 
#include <img_io_c.h> 
#endif 

#ifndef VIR_IMG_PRIM_C_H 
include <img_prim_c.h> 
#endif 

#define WIDTH 128 

#define HEIGHT 128 

#define IMAGE 1 "imagel" 

#define IMAGE2 "image2" 

#define GLOBAL_WEIGHT 1.0 

#define LOCAL_WEIGHT 0.5 

#define TEXTURE_WEIGHT 0.3 

//define STRUCTURE„WEIGHT 0.6 

vir_engPrimitiveID 

default_primitives[]-{ VIR_GLOBAL_COLOR_TD, 
VIR_LOCAL_COLOR_ID, 
VIR_TEXTURE _ID, 
VIR_STRUCTURE_ID }; 

vir_float 

detault_weights[]-{ GLOBAL_WEIGHT, 

LOCAL_ WEIGHT, 

TEXTUR E_WEIGHT, 

STRUCTURE_WEIGHT }; 
#defbe N_DEFAULT_WEIGHTS 4 
/* 

* This convenience function creates a vir_medMedia object from 

* a file which contains raw WIDTH x HEIGHT RGB (interleaved) data, 

* and then computes a feature vector for the object. The feature 

* vector (and its size) are returned to the caller. 
* 

* For users of the Virage IRW module, there are numerous routines 

* for reading and writing standard file formats (ie. gif, jpeg, 

* etc.) directly to/from Virage vir_medMedia objects. 
V 

void 

CreateAndAnalyzeMedia( const char * filename, 
vir_engEngineH engine, 
vir_engSchemaH schema, 
vir__engFeatureVectorData ** feature, 
vir__engfiyteCount m count 

{ 

vir_MediaH media; 
vir_byte * data; 
vir_uint32 image_size; 
int bytes_read; 
FILE • fp; 

/♦•**• •••♦•**.*«** >>>>> Begin Execution 
««<***•****•****••****/ 

/• Open the file or raw pixels */ 
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fp = fopen(filename,*'rb"); 

if (fp — NULL) 

{ 

fprintf(stderr/'Unable to open file %sVn", filename); 
ocit(-l); 

} 

image_size - WIDTH • HEIGHT " 3; 

/• Create a buffer to hold the pixel values 7 

data = (vir_byte *)malloc(image_size); 

if (data — NULL) 

{ 

fbrintf(stderr, "Problems allocating data buffer\n"); 
exit(-l); 

} 

/• Read the pixels into the buffer and close the file •/ 
bytes_j:ead - fiead(cbta,sizeof(vir_byte),image_size,fp); 
fclose(fp); 

if (byU5_read != image_size) 

fprintf (stderr,"Problems reading file %s\n",filename); 
exit(-l); 

} 

/* Create our media object from the buffer */ 

if ( vir_imgaeateImageFiomData( WIDTH, HEIGHT, data, &media ) != VIR_OK ) 
{ 

fprintf (stdcrr, "Problems creating imageVn"); 
exit(-l); 

} 

/* Free the data buffer. The media object has made a private copy 7 
free(data); 

/• Now we analyze the media object and create a feature vector */ 

if ( vir_engAna I yze (engine, schema, media, feature, count) 1= VIR_OK ) 

{ 

fprintf (stderr, "Problems analyzing imagelW); 
exit(-l); 

} 

/* Now that we are done with the media object, we destroy it 7 

if (vir_X>estroyMedia(media) !- VIR_OK ) 

{ 

fprintf (stderr, "Problems destroying media\n"); 
cxit(-l); 

} 

} 

int 

main(int argc, 
char * argv[] ) 



{ 



vir_engFeatureVectorData * featurel; 
vir__engFeatureVectorData * featured; 
vir_engByteCount countl; 
vir_engByteCount count2; 
vir_engEngineH engine; 
vir_engSchemaH schema; 
vir_fioat distance; 
vir_engWeightsH weights; 

/«****»*««****«**»«*«* >>:>>> Begin E xecu ti on <<K<< **t******** 

/* Wc create a default image engine V 

if ( vir_imgCreateImageEngine( &engine ) 1= VlR_OK ) 

{ 

fprintf (stderr, "Problem creating image engine\n"); 
exit(-l); 

} 

/"We create a default image schema 7 

if ( vir_jmgCreateDefaultSchema( vir_DEFAULT_SCHEMA_20, engine, & schema ) 
= V[R_OK ) 
{ 

fprintf (stderr, "Problems creating schemata"); 
exit(-l); 

} 

/• Now we'll use our convenient function to create feature vectors 

* We don't bother checking return codes - the function bombs out 

* on any error condition... 
7 

CreateAndAnalyzeMedtaflMAGEl, engine, schema, &featurel, &countl); 
Create An dAnalyzeMcdia (I MAG E2, engine, schema, &fcature2, &count2); 

r 

* Now I have the feature vectors in hand — in a real application I might 

* choose to store them persistently - perhaps as a column in a relational 

* database, as part of an object in an OODB, or as part of the header of a 

* file format. In this toy example, we'll just compare these vectors 
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against 

* each other and print out the visual distance between the images that they 

* represent... not very interesting, but illustrative at any rate. 
V 

/* Create a weights structure. We initialize the weights to some arbirtrary 

* values which we have //define' d above. In a real application, we would 
probably 

* get these weights from a user interface mechanism like a slider, but 
again, 

* this is just to illustrate the API... 
*/ 

if ( vir_engCreateAndInitiaIize Weigh ts( default^rimitives, 
default_weights, 
N_DEFAULT_WEIGHTS, 
&weights ) ) 

{ 

rprintf(stdeTr, w Problems setting / normalizing weights\n*'); 
cxit(-l); 

} 

printf( "Starting 500000 W); 

for ( int ii «» 0; ii < 500000; ii++ 

vir__engCompare( engine, featurel, feature2, weights, & distance ); 

} 

printf ( a Done.\n"); 

/* Finally, we'll compare the two feature vectors and print out the 
distanel V 

if ( vir_engCompare( engine, featurel, feature2, weights, & distance ) !- 
VIR_OK ) 
{ 

fprintffstderr," Problems comparing the images\n"); 
exit(-a); 

} 

fprintf(stdout,"The distance is %f!\n",distance); 
/* We're done with the feature vectors */ 

if ( (vir_engDcsUoyFeatureVcctorData(fcaturel) !- VIR_OK ) || 
(vir_engDestroyFeatureVectorData(feature2) != VIICOK ) ) 

{ 

fprintf(stderr," Problems destroying feature vector\n"); 
exit(-l); 

} 

/• Clean up the schema •/ 

if ( vir_engDeBtroySchema (schema) !- VIR_OK ) 
{ 

fprintf(stderr,"Problems destroying the schema\n"); 
exit(-l); 

} 

/* Clean up the engine */ 

if ( vir_cngDestroyEnginc(engine) !» VIR_OK ) 

fprintffstderr," Problems destroying the engine\n"); 
exit(-l); 

} 

return 0; 



What is claimed is: 

1. A method of visual object comparison for a database of 50 
visual objects, comprising the steps of: 

a) applying primitives to a first visual object to extract a 
first feature vector, each primitive providing at least 
one primitive value to the first feature vector; 

b) applying primitives to a second visual object to extract 55 
a second feature vector, each primitive providing at 
least one primitive value to the second feature vector; 

c) providing an ordering value for each primitive to order 
the primitives; 

d) comparing one of the primitive values from the first 60 
feature vector with the corresponding primitive value 

of the second feature vector according to the ordering 
so as to obtain a primitive score; 

e) applying a primitive weight to the primitive score to 
determine a weighted primitive score; es 

f) summing the weighted primitive score into a summed 
total score; and 



g) repeating steps d-f until the summed total score crosses 
a selected threshold. 

2. The method defined in claim 1, wherein the repeating 
step alternatively repeats until all primitives in one of the 
feature vectors have been processed to produce a final score. 

3. The method defined in claim 1, wherein the ordering 
value is the primitive weight corresponding to each primi- 
tive. 

4. The method defined in claim 1, wherein the ordering 
value is a cost associated with the execution time of the 
primitive. 

5. The method defined in claim 1, wherein the ordering 
value is a combination of the primitive weight correspond- 
ing to each primitive and a cost associated with the execu- 
tion time of the primitive. 

6. The method defined in claim 1, wherein the order of the 
primitives is defined by a function which orders the primi- 
tives from least ordering value to greatest ordering value. 

7. The method defined in claim 6, wherein the function 
comprises multiplication. 



50 
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8. The method defined in claim 6, wherein the function is 
defined as maximum((1.0-cost), weight). 

9. A software system for visual object comparison of a 
database of visual objects, the system comprising: 

means for applying primitives to a first visual object to 5 
extract a first feature vector, each primitive providing at 
least one primitive value to the first feature vector; 

means for applying primitives to a second visual object to 
extract a second feature vector, each primitive provid- 
ing at least one primitive value to the second feature 10 
vector; 

means for providing an ordering value for each primitive 

to order the primitives; and 
means for thresholding including: 15 

a) comparing one of the primitive values from the first 
feature vector with the corresponding primitive 
value of the second feature vector according to the 
ordering so as to obtain a primitive score, 

b) applying a primitive weight to the primitive score to 20 
determine a weighted primitive score, 

c) summing the weighted primitive score into a 
summed total score, and 

d) repeating a-c until the summed total score meets a 
selected threshold. 
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10. A program storage device storing instructions that 
when executed by a computer perform a method for a 
threshold based visual object comparison of a database of 
visual objects, the method comprising: 

a) applying primitives to a first visual object to extract a 
first feature vector, each primitive providing at least 
one primitive value to the first feature vector; 

b) applying primitives to a second visual object to extract 
a second feature vector, each primitive providing at 
least one primitive value to the second feature vector; 

c) providing an ordering value for each primitive to order 
the primitives; 

d) comparing one of the primitive values from the first 
feature vector with the corresponding primitive value 
of the second feature vector according to the ordering 
so as to obtain a primitive score; 

e) applying a primitive weight to the primitive score to 
determine a weighted primitive score; 

f) summing the weighted primitive score into a summed 
total score; and 

g) repeating steps d-f until the summed total score meets 
a selected threshold. 
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